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Abstract 

The recent developments of basis pursuit and compressed sensing seek to extract infor- 
mation from as few samples as possible. In such applications, since the number of samples 
is restricted, one should deploy the sampling points wisely. We are motivated to study the 
fSJ ■ optimal distribution of finite sampling points. Formulation under the framework of optimal 

reconstruction yields a minimization problem. In the discrete case, we estimate the distance 
between the optimal subspace resulting from a general Karhuncn-Locvc transform and the 
kernel space to obtain another algorithm that is computationally favorable. Numerical ex- 
periments arc then presented to illustrate the performance of the algorithms for the searching 
of optimal sampling points. 
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1 Introduction 



Functions describing natural phenomenon or social activities need to be converted into discrete 
data that can be handled by modern computers. From this viewpoint, sampling is the foundation 
for signal processing and communication. The subject origined from the celebrated Shannon 
sampling theorem [T7] , which gurantees the complete reconstruction of a band- limited function 
■ from its values on some equally-spaced points. The elegant result motivates many follow-up 

studies, making sampling an important research subject in applied mathematics. We shall give 
a brief and partial introduction to the history and progresses. 



■ Mathematically, sampling means to evaluate a function. To ensure the stability, it is arguable 

CN ■ that sampling should only take place in function spaces where point evaluations are continuous. 

Such spaces when endowed with an inner product structure arise in many other areas of math- 
ematics. They are termed as the reproducing kernel Hilbert spaces (RKHS), as by the Riesz's 
^ ■ lemma there exists a function that is able to reproduce the function values through the inner 

^ . product. In Shannon's theorem, the space of functions that are band-limited to [— tt, vr] and are 

equipped with the inner product of L^(M) is an RKHS with the sine function as its reproducing 
kernel. This interpretation gives the hope of searching for Shannon-type complete reconstruc- 
tion formula for other RKHS. It was found in |12] that as long as one has a frame or a Riesz 
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basis formed by the reproducing kernel, then a Shannon-type sampHng formula is immediately 
available by the general theory of frames. They showed that many past sampling formulae can 
be obtained in this manner. Recently, the approach has been generalized to reproducing kernel 
Banach spaces [2U] by frames for Banach spaces via semi- inner-products, j21j . 

Shannon type formulae enable us to have lossless representation of a function that is usually 
defined on an uncountable continuous domain using countable data. Going from uncountable 
to countable is a remarkable progress. However, countable is still infinite and computers can 
not store or handle infinitely many data. This raises the question of how to reconstruct a 
function from its finite sample. For the crucial band-limited functions, two modified Shannon 
series have been proposed in the literature [6l [13] , where it was shown that over-sampling can 
lead to exponentially decaying approximation error. Sampling data often comes with some cost. 
When it is available, one is inclined to use as accurate reconstruction methods as possible. It 
has long been known that in the maximum sense, the best way of reconstruction in an RKHS 
is via the minimal norm interpolation [T^. The approximation error for over-sampling in the 
Paley- Wiener space of band- limited functions is estimated in . 

In this note, we focus on another important question in sampling, which is seldom considered 
in the literature. Usually the number of sampling points in a practical application is limited. 
When that number is fixed, we ask what is the best strategy of deploying the sampling points, 
under the condition that the best reconstruction method is engaged. The study is also motivated 
by the recent development in basis pursuit |4] and compressed sensing [3] , which seek to extract 
information from as few samples as possible. Since the number of samples is restricted, we 
should of course distribute the sampling points wisely. 

We shall formulate the question in the next section. It will become clear that the solution of 
the problem amounts to approximating the subspace spanned by the first few eigenvectors of a 
compact operator. When the operator is of finite rank, the eigenvectors can be obtained by the 
well-known Karhunen-Loeve transform ( also called principal component analysis in engineering) . 
To extend the algorithm to operators usually defined by integrals in this application, we shall 
establish a general Karhunen-Loeve transform in Section 3. An alternative approach by subspace 
approximation that can significantly reduce computational cost will be introduced in Section 
4. Various examples by numerical experiments will be presented in Section 5. The study will 
lead to algorithms for the searching of the optimal distribution of finite sampling points for 
commonly-used RKHS. 

2 Formulation 

A natural choice of background function spaces for sampling is reproducing kernel Hilbert spaces 
(RKHS). Let X be a prescribed metric space where functions of interest are defined. An RKHS 
on X is a Hilbert space % of functions on X such that for each x G X, the point evaluation 
functional 



is continuous. An RKHS % possesses a unique reproducing kernel [T], which is a function on 
X y. X characterized by the properties that for all f £ Ti and x £ X, K(x, ■) £ Ti and 



where (•, •)-^ denotes the inner product on Ti. On the other hand, the reproducing kernel K 
uniquely determines the RKHS Ti. Thus, the RKHS of a reproducing kernel K is usually denoted 
by TiK- For more information on reproducing kernels, see \9\ \W\ [T5]. 



5x(/):=/(x), f£n 
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We emphasize that an RKHS should first be a Hilbert space of functions, which imphes that 
a function in the space has zero norm if and only if it vanishes everywhere. For instance, the 
Paley- Wiener space 

B:={f e C(IR'^) n L^M.'^) : supp / C [-vr, vr]"^} 
is an RKHS. In this paper, the Fourier transform / of / G L^{W^) is defined by 



where x • ^ is the standard inner product on W^. The norm on B inherits from that in 
The reproducing kernel for the Paley- Wiener space B is the sine function 

d 



-i-r sinvrfx, — Uo) 
sine {x,y) = T[ . ^ ' \" , x,y£R^. 



We consider the deployment of finite sampling points in an RKHS in this paper. Let Tix 
be an RKHS on a metric space X and the number n of sampling points be fixed. The choice 
of the sampling points depends on the method of reconstruction and the measurement of the 
approximation error. For most applications, one desires to reconstruct values of the function 
considered on a compact subspace C X. The reconstruction error will be measured by the 
norm in Lfi{i}). Here p G [l,+oo], // is a finite positive Borel measure on $7, and the Banach 
space Lfi{Q) consists of Borel measurable functions / on $7 that satisfy 

i/p 



Lliu) ■■= (^JjfitWdfiit)^ <+oo, l<p<+oo 



and 

11/11^00(^7) '■= inf{c > : I/I < c almost everywhere on Q with respect to fj,} < +oo. 

We shall assume throughout this paper that K is continuous on 0. We observe by the 
reproducing property ()2.ip for all x,y £ X and / G TIk that 

\f{x)-f{y)\ =\{f,K{x,.)-K{y,.))n,\ 

<||/||«^||K(x,-)-m-)llwK 



^^Y^K(x,2;) - K{x,y) - K{y,x) + K{y,y). 

Therefore, every function / G Tix belongs to the space C(fl) of continuous functions on Q 
equipped with the usual maximum norm. Consequently, T-Lk C L^(0) for all 1 < p < +oo and 
all finite Borel measures fi on Q. 

Let X := {xj : 1 < j < n} be a choice of n sampling points. The sample data of a function 
/ G Hk is hence of the form 

:= {fix,) : 1 < J < n}. 

A reconstruction method A is then a mapping from C'^ to L^(il). For a particular / G Hk, the 
reconstruction error is measured by 

Wf-AiiximiL^.m- 
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We then follow the general setting of optimal sampling in [l^ and [19], that is, we measure the 
performance of a reconstruction method A by 

piA,X):=snp{\\f-A{Ix{fmLUn)-- f ^'^k. II/IIw,- < !}• 

Since we are concerned with the optimal choice of sampling points only, we shall try to remove 
the reconstruction method from the picture. To this end, we shall use the optimal reconstruction 
algorithm Ax for each choice of sampling points X. Namely, 

p{Ax-,X) = mi{p{A,X) : among all mapping A from to LP{VL, dp)]. (2.2) 

Finally, our problem reduces to finding the sampling points X that minimizes the function 

£{X) := p{Ax,X), X G X". 

The optimal reconstruction algorithm A is known to be the minimal norm interpolation 
[14^ I19j. The following lemma also gives the reconstruction error. 

Lemma 2.1. For each set of sampling points X £ X", the optimal reconstruction method Ax 
satisfying 112. 2\) is given by 

Ax{1x{f)) ■■= argmin{||5||-H;^ : g G TiK, '^x{g) = Xx{f)}. 

The associated reconstruction error is of the form 

p{Ax,X) := sup{||/||iP(f,) : f (LUk, VWuk < 1, ^^if) = 0}- 

A reproducing kernel determines everything about the corresponding RKHS. The following 
simple observation fulfills this hope. Set 

Sx := span{K{t,-) : t £ X}. (2.3) 

We shall impose another assumption through the paper that for every set of pairwise distinct 
sampling points X, the matrix 

K[X] := [K{xj,Xk) : l<j,k<n] 

is nonsingular. A reproducing kernel is at the same time a positive-definite function, pQ. Thus, 
K[X] is strictly positive-definite. With this assumption, Sx is n-dimensional with the orthonor- 
mal basis 

n 

'^j = ^(^jkK{xk,-), l<j<n, (2.4) 

k=l 

where 

[aj,k ■.l<j,k<n] = {K[X])-'/\ (2.5) 
Corollary 2.2. Let (px be defined by 

(j)x{x) := dist{K{x,-),Sx) ■■= inm{\\K{x,-) - g\\nK ■ g^Sx}, x £ X. 
Then it holds true for each x £ Q that 

^x{x) = sup{|/(x)| ■.f£nK, WfW-HK < 1, ^Af) = 0}. (2.6) 
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Furthermore, for each p > 1 

p{A;r,X)<\\MLUn)^ (2.7) 
and for the special case when p = +00, 

piAx,X) = \\cl)x\\L^(^^y (2.8) 

Proof. Let / be an arbitrary function in Hk such that ||/||-^^- < 1 and Tx{f) = 0. Then by the 
reproducing property ()2. 



(/,i^(x,v))wK =0 for aUl<j<n. 

It follows that / is orthogonal to every g G Sx- We hence see that 

|/(x)| = \{f,K{x,.)-g)n,\ < \\f\\HjK{x,.)-g\\n, < \\K{x, ■) - g\\n, . 

As the above equation is true for all g G Sx, we get that |/(x)| < cpxix), x G X. As a result, it 
holds for all p > 1 that 

piAx,X) < Uxh^^n)- 

On the other hand, letting / be the orthogonal projection of K{x, •) onto Sx and then be 
normalized to a unit vector yields (|2.6p . Thus, for p = +00, 



piAx,^) = sup{sup{|/(x)| ■.xGn}:f€nK, WfWuK < 1, ^xif) = 0} 
= sup{sup{|/(x)| -.feTiK, WfW-HK < 1, ^xif) =o}:xen} 
= \\4'x\\Lf^{n), 



which proves (|2.8p . □ 

By the above corollary, we shall hence try to minimize the quantity ||</'a'||lP(q) as a way to 
bound the intrinsic error p{Ax,^)- A simple calculation tells that 

n 

(f>l.{x) = K{x,x) -^\uj{x)\^, xeX. (2.9) 
i=i 

This together with (|2.4p and (|2.5p gives a function about X that needs to be minimized. The 
complicated form of the function coped with the nonlinearity of the reproducing kernel makes 
directly minimizing this function rather difficult. Before discussing alternative computational 
methods, we present two simple examples to demonstrate that the optimal points might not be 
equally-spaced distributed in the reconstruction domain Q. 

Example 2.3. In this trivial example, we let X = W^, U a compact subset in and n = 1. 
The reproducing kernel is given by a radial basis function 

K{x,y):=ip{\\x-y\\), x,y€R'' 

where \\x\\ denotes the standard Euclidean norm on M*^. The function ip is a univariate function 

that defines a reproducing kernel in the above manner. By Schoenberg's theorem 1 15^ . 

must be a completely monotone function. In particular, (p is nonincreasing. For simplicity, we 
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also assume that (p{0) = 1. We shall use the space (7(0) to measure the reconstruction error. 
The optimal sampling point xq is hence the minimizer of which leads to 



miUfprnd \\<i)t\\%fn\ = minmaxl — \K(t,x)\'^ 

tfcK ll-rt iic(S2) ^^^^ 

= 1 — maxmin \K{t, x)p 
= 1 — maxmin(^^(||x — t\\ 

1 — (j3^(min max \\x — t\ 

■ xen 



By the above equation, xq is the point has a minimal radius r for which 0, C {x : ||x — Xo|| < r}. 
Particularly, for d = 1, we should choose xq as the mid-point of the end points of Q. 

Unlike the above example, our second example shows that nonlinearity could occur as the 
number of sampling points exceeds 1. The analysis of this example of two sampling points is 
already rather tedious but elementary, and is thus omitted. 

Example 2.4. In this example, we let X = M, Q = [a, b] C n = 2 and consider the 
exponential kernel 

K{x,y) :=e-ll"-^ll, x,yeR. 

In this case, for X := {xi,X2}, 

(t)x{x) = 1 - V{x,Xi,X2) 

where 

g-2||3;i-a;|| _|_ g-2||x2-x|| _ 2g-(||a::i-a:|| + ||x2-x|| + ||a;i-X2||) 
V{x,Xi,X2) := 



1 — g-2||xi-a,'2|| 

The optimal sampling points xi,X2 is the minimizer of 

sup min F(x, xi, X2). 

xi,x2m 

Let L := b — a. After some careful but elementary analysis, it can be found that the optimal 
sampling points are 

1 ( -e~^ + ^e-^L^8e-L \ 
xi = a--ln (2.10) 
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and 



Although measuring the reconstruction error by the maximum norm in C(il) seems the most 
natural and the maximum norm dominates other norms, finding the extrema of a multivariate 
function is always difficult. A Hilbert space norm can often save computation efforts. Prom this 
consideration, we restrict ourself to the choice L^{^) in the rest of the paper. In the case when 
^ •= {Uk : 1 < ^' < "1-} ^ X with m ^ n and fi{{yk}) = 1/m for 1 < k < m, the n-dimensional 
subspace Sq that minimizes 

inf < — dist '^{K{yk, ')iS) : S is an n-dimensional subspace of T-Lk \ 
I fc=i J 
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is given by the Karhunen-Loeve transform. More specifically, Sq is spanned by the eigenfunctions 
corresponding to the largest n eigenvalues of the compact positive bounded linear operator T 
on Hk given by 

^ m 

m ^-^ 

k=l 

The process of computing the eigenfunctions and eigenvalues of this operator is also known as 
kernel principal component analysis in machine learning [16]. Of course, the story is not over 
yet as the space we are looking for should be of the form ()2.3p . Our idea is to find sampling 
points X for which Sx best approximates the subspace spanned by the first n eigenfunctions of 
T. Before we estimate the distance between these two subspaces of T-Lk, we first show that for 
general reconstruction error, the minimization problem 

min <! J dist^{K{x, ■),S)dfi{x) : 5 is an n-dimensional subspace of "Hi^j- (2.12) 

can still be reduced to computing the first n eigenfunctions of a compact positive bounded linear 
operator on Tix- We shall prove such a Karhunen-Loeve transform exists for general measure 



3 A general Karhunen-Loeve transform 

The purpose of this section is to show that the subspace that minimizes (|2.12p is spanned by 
the first n eigenfunctions of a compact positive bounded linear operator. We shall prove this 
result under a very general setting. 

Let Ti be an infinite-dimensional separable Hilbert space, fi) be a measure space, that 

is, M is a cr-algebra consisting of certain subsets of 0, and /i is a finite positive measure on M. 
We assume that there is a function F : Q ^ H such that for each u £ Ti, the function 

Lo {F{uj),u)n 

is measurable with respect to and such that ^ F'j^{il,, Ai). For a fixed n G N, we 

want to find an n-dimensional subspace V of H that approximates F{^}) well. By measuring 
the approximation of each candidate subspace V as 

£{V):= [ dist"^ {F{uj),V)dn{uj), 
Jn 

the optimal approximating subspace 5„ is the one that minimizes the above error among all n- 
dimensional subspaces oiTi. A Karhunen-Loeve transform for this general question is presented 
below. 

Theorem 3.1. The operator T :% ^ % determined by 

{Tu,v)u= / iu,F{uj))-HiF{uj),v)ndfJ,{uj), u,v (3.1) 
Jn 

is compact positive bounded linear. The optimal n-dimensional subspace Sn that satisfies 

£{Sn) = inf{if(y) : V is an n-dimensional subspace ofH} 

is given by Sn = spanjcj : 1 < j < n}, where Cj's are the orthonormal eigenfunctions corre- 
sponding to the largest n eigenvalues ofT. 
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Proof. Let v £ Ti he fixed. Then for each u £71, we observe that 



where 



It imphes that 



iu,F{u))n{F{u;),v)ndfi{u;] 



< / \{u,F{u;))n\\{F{u;),v)n\dfi{uj) 
Jn 

< Cf \\u\\n\\v\\H, 
|F(a;)|||,dMM. 



u 



{u, F{u}))y,{F{oj),v)ndfi{oj) 



is a bounded hnear functional on H. By the Riesz representation theorem, there exists a unique 
vector Wv associated with v such that 



{u, F{uj))u{F{u}),v)Hdn{u}) = {u,Wy)n- 



We denote the mapping sending v to Wi, by T. It is clear that this operator is linear. Moreover, 
we have 



\{u,Tv)n\ 



< CFWuWnWvWn- 



(n, F{uj))-h{F{u;), v)-udfi{uj) 
Therefore, HT-yH-^ < Ci;' implying that T is bounded. We also see that for all u £ T-L 

{Tu,u)n = I \{u,F{uJ))H\''d^l{u:)>^. 



Thus, T is positive. 

We next show that T is compact. To this end, let Uj be a bounded sequence in %. Then Tuj 
is bounded as well. As % is reflexive, its unit ball is weakly compact. We may hence assume 
that Tuj converges weakly to some uq 'va.T-L. In other words, 

lim {Tuj,v)'}{ = (uo,f)-H for all v £%. 

We shall prove that Tuj converges to uq strongly in H. Note that 

{Tuj - uo,Tuj - uo)n = (Tuj - no, Tuj)n - {Tuj - uq,uq)u- 
As {Tuj — no, no) — as j — )• oo, it suffices to show that 

lim {Tuj — uo,Tuj)y_ = 0. 

We observe from the definition of T that 



{Tuj - uo,Tuj)'H = / {{Tuj,F{uj))-u- iuo,F{uj))-H){F{uj),Uj)-Hdfi{uj). 
Jn 

For each u £ Q, {Tuj, F{uj))'}{ — t- {uq, F{u}))-}{ as Tuj converges weakly to uq. As a result, there 
holds 

lim \{{Tuj,F{uj))n - iuo,F{uj))n){F{Lo),Uj)n\ 
< Il-^(^)I|-Hsup||nj||-H lim \{Tuj, F{uj))n - [uq, F{uj))n\ = 0- 



Furthermore, 

\i{Tu„F{.))n - iuo,Fiu))n){F{.),Uj)H\ < {\\T\\\\u,f^ + holkl|n,|k)l|i^(-)lll^ e Llin,M). 
The above equations together imply by the Lebesgue dominated convergence theorem that 



\im (Tuj - uo,Tuj)n = hm / {{Tuj,F{uj))-H - iuo,F{uj))-H){F{u}),Uj)'Hdn{uj) = 0. 

Therefore, \\Tuj — tto||-H ~^ j ~^ oo. We have hence proved that T is a positive compact 
bounded hnear operator on H. 

Turning to the last claim of the theorem, we let V be an n-dimensional subspace of Ti with 
the orthonormal basis fj, 1 < j < n. Then 

„ n „ n 

£{V)= / \\F{uj)f^-Y,m^)JM'M^)= / \\F{u;)f^dfi{uj)-Y,{TfjJj)n. 

Thus, the question amounts to finding an orthonormal sequence {/j : 1 < J < ?^} in H that 
maximizes the sum 

n 

i=i 

The analysis of this last part is the same as that for the standard Karhunen-Loeve transform, 
that is, the optimal sequence is achieved by the orthonormal eigenfunctions corresponding to 
the largest n eigenvalues of T. □ 

Returning to the sampling, we specify Q to be a compact subset of the input space X, fi to 
be a finite positive Borel measure on X, to be a continuous kernel on X, and 

F{t) := K{t,-), ten. 
By Theorem 13. H the bounded linear operator T from Tix to T-Lk determined by 

{Tf,g)nK= I f{t)^d^l{t), f,genK 
Jn 

is positive and compact. It is of the explicit form 

(r/)(x)= / f{t)K{t,x)dfi{t), xex, fenK. (3.2) 
Jn 

For each n-dimensional subspace V of T-Lk with the orthonormal basis {uj ■ 1 < j < n}, 

£iV)= I dist\K{t,-),V)dfiit)= [ Kit,t)dfiit)-f2iTu,,Uj)n^. 
Jn Jn j^-^ 

An orthonormal basis for Sx = span {K{xj, •) : 1 < j < n} is given by (j2.4p . Thus, 

„ n n n 

£{V) = / K(t,t)d/i(t)-^^^a,fcaz,(r(K(xfc,.)),A>z,.))wK-. 

7 = 1 k=l 1=1 



Setting 



1/2 



Kky.= {T{K{xkr)),K{xi,-))HK = f K{xk,t)K{t,xi)dfi{t), l<k,l<n, 

Jn 

we conclude that the optimal sampling set X is the solution of 

n 

max^ J2 ajkaijKkj= maxtT(^{K[X])~^^^K{K[X])-'^/'^'^= max^ tr (^K^/^^^f^^^j^-ij^i 

(3.3) 

where tr (M) stands for the trace of a square matrix M. When the eigenfunctions and eigen- 
values of the operator T is known, one has a different formulation of the above optimization 
problem. Let e^, i E I be all the orthonormal eigenfunctions of T with a positive eigenvalue Aj. 
We see for all 1 < /c, / < n that 

^k,i = {T{K{xk,-)),K{xi,-))'Hii = ^{K{xk,-),ei)nK{ei',K{xi,-))'Hii{Tei,ei:)'Hii 

= y^^Xiei{xi)ei{xk). 
iei 

Practically, we are most concerned with the case when Q has finite cardinality that is con- 
siderably larger than n. In this situation, T has finite rank. Assume that I = {1, 2, . . . , m} and 
set 

^ii ■= ^i,i\/>H, Dki := ei{xk), l<i,l<m, I < k < n. 

With these notations, IK = (Z)A)(DA)^. When T is of finite rank, this together with the fact 
that for a square matrix A, tr (AA^) = tr (A^A) yields an equivalent formulation of 



max tr {AD^ {K[X]y^ DA) . (3.4) 

Computing all the eigenfunctions and eigenvalues of T can be costly when m is large. Instead 
of attacking (j3.3p or (j3.4p directly, we shall relax p.4p to use only the n eigenfunctions of T 
corresponding to the first n largest eigenvalues of T, which can often be obtained efficiently by 
the standard Karhunen-Loeve algorithm. Following the idea described at the end of Section 2, 
we shall achieve this by estimating the distance between Sx and the one spanned by the first n 
eigenfunctions of T. 



4 Subspace approximation 

We now let Cj, 1 < j < n he the orthonormal eigenfunctions of T, defined as in ()3.2p . corre- 
sponding to the largest n eigenvalues of T. We assume that these eigenvalues are positive. By 
Theorem 13. H the subspace St '■= spanjcj : 1 < i < n} is a minimizer of optimization problem 
(|2.12p . We wish to find sampling points X such that £{Sx) — £{St) is small, where for a closed 
subspace V of T-Ik, 

£{V)= [ dist^{K{t,-),V)dfi{t). 
Jn 

To this end, we first observe that for any closed subspaces U and V of T-Lk, l^iU) — SiV)\ can 
be bounded by the subspace distance between U and V. 
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Denote by Py the orthogonal projection operator from T-Lk onto V . The distance between 
two closed subspaces U and V of Hk is defined by 

dist(C/,y) := \\Pu-Pv\l 
where \\Pu — PyW is the operator norm of Pu — Py, that is, 

IIP p II- \\PuU) - PvU)\\Hk 
\\Pu - Py\\ - sup — . 

J&Hk WJWHk 

Apparently, the above supremum can be restricted to the closed subspace spanned by the union 
of U and V. 

Lemma 4.1. It holds for any two closed subspaces U and V ofT-Lx that 

\£{U) - £{V)\ < 2Kn dist (^7, V), (4.1) 

where 

Kn := / K{t,t)dfi{t). 
Jn 

Proof. Denote by / the identity operator. We estimate that 

I dist 2(if(t, .),[/) - dist'^{K{t,-),V)\ 
= |||(/ - Pu)Kit, Oil?,, - ||(/ - Pv)Kit, OIlLI 

< 11(1 - Pu)Kit, •)-(/- Pv)K{t, -nnAWil - Pu)Kit, -^n, + \\{I- Py)K{t, Oll^,) 

< \\Pu-Py\\\\K{t,-)\\nA\\Kit,.)\\H. + \\K{t,.)\\n,,) 
= 2K{t,t)dist{U,V), 

from which (|4.1|) follows. □ 

According to the above lemma, we face to figure out the distance between subspaces Sx and 
St- To this end, we introduce some notations. Set 

n n 

fj ■= ^{K{xj, ■),ek)nK^k = ^ ek{xj)ek, I < j < n. 

k=l k=l 

In other words, fj is the orthogonal projection of K{xj, •) onto St- Also, set 

hj := K{xj, ■) - fj, 1 <j < n. 

Accordingly, we define two positive definite matrices by letting 

A := [{fk, fj)HK ■'^ < j,k <n\ and B := [{hk, hj)uji ■l<j,k<n] 

We shall assume that A and B are both nonsingular. It will be shown in the proof below that 
A + B = K[X]'^. We assume in this section that K[?(!] is nonsingular as well. 

Lemma 4.2. // the matrix E := [ek{xj) ■ 1 < j,k < n] is nonsingular then 



*''^-^-' = f -WW(EE^^ (4.2) 

where \max{^) denotes the largest eigenvalue of a square matrix M . If 'E is singular then 
dist {Sx, St) > 1. 
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Proof. If E is singular then there exists a nonzero function in Sx that is orthogonal to St- It 
follows immediately that dist {Sx,St) > 1- 

Suppose that E is nonsingular. By the nonsingularity of E, St is identical with the following 
subspace of Hk- 

U := span{/j- : 1 < j < n}. 

The space Sx coincides with U := span{/j + hj : 1 < j < n}. For later use, we also introduce 
another two subspaces of TIk- 

V := span{/ij ■ 1 < j < n} and W := span{/j, hj : 1 < j < n}. 

We first observe that 



dist {U,U) = sup 



\Pu{w) - P^iw)\\n^ 



Any w £ W can be represented as w = u + v, where u £ U and v £ V. By definition, we have 
(/j, hkj-HK — ^ 1^ j, k < n, which yields that U is orthogonal to V. We get that 

dist [U,U) = sup ; ^2 — 



Ijyu -r v)\\ '^ 
ueU,v£V \\u\\nK + ll^ll-Hx 



sup — r— "2 []— ^- (4.3) 



To estimate ()4.3p . we first give -P^;(ti + explicitly. To this end, we assume that 

n 

P^iu + v) = Y, Ckifk + hk) (4.4) 

k=l 

for some c,- G C. By the characterization of orthogonal projections, we get the equations 



{u + v Ckifk + hk), fj + hj)nK =0, 1 < i < n, 

A;=l 

which leads to 

n 

X] Cfc((/fc, /j)-HK + {hk, hj)nK) = {u, fj)'HK + {v, hj)-HK, 1 < i < n. (4.5) 
k=l 

Set c := [cfc : 1 < /c < n]-^. For each 

n n 

X -Ufc/fc £U and t; = X TJ^/ifc G ^, 



n 

U = 

k=l k=l 



we set u := [uk : 1 < A; < ii]'^ and v := [v^ : 1 < /c < Ji]-^. Then equation (|4.5|) can be rewritten 
in a matrix form 

(A + B)c = Au + Bv. 

Thus we obtain 

c = (A + B)~^(Au + Bv). (4.6) 
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By (|4.4p . we have 



\u-P^iu + v)f^^ 



\u 



k=l 

n n 



k=l 



k=l 



k=l k=l 

= (u-c)*A(u-c)+c*Bc. 
Substituting (|4.6p into the above equation, we get that 



\u - Pq{u + v)\\'^^ = u*Au- u*A(A + B)-^(Au + Bv) 

-(Au + Bv)*(A + B)~^Au 
+ (Au + Bv)*(A + B)"^(Au + Bv) 
= u*Au - u*A(A + B)~^Au + v*B(A + B)~^Bv. 



Together with the fact that 



the above equation leads to 



dist^ {U,U)= sup 

u,v6C' 



Hhk = V*Bv 



u* Au - u* A(A + B)-i Au + v*B(A + B)-iBv 



u* Au + v*Bv 

Let a := A^/^u and b := B^/^v. By introducing a matrix 



M : = 



I- aV2(a + b)-iaV2 

B1/2(a + B)-1bV2 



we get that 



dist^(C/,C/) = sup 

a,beC" 



a 
b 



M 



a 
b 



a 
b 



l|M||, 



(4.7) 



(4.8) 



where || • ||2 denotes the standard Euchdean norm of a vector or the spectral norm of a square 
matrix. On the one hand, we have 



|I_aV2(a + B)-1aV2||. 



||A-V2b(A + B)"1a1/2||2 
Amax(A-V2B(A + B)-iAi/2). 



Since the matrix A is nonsingular, the matrix A ^/^B(A + B) ^A^^'^ has the same eigenvalues 
with the matrix B(A + B)~^. Hence, we have 



|I - Ai/2(A + B)-'A'/% = Amax(B(A + B)-^). 



(4.9) 
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On the other hand, by the nonsingularity of the matrix B, we also have 

||b1/2(A + B)-1bV2||2 = Amax(BV2(A + B)-iBi/2) 

= Amax(B(A + B)-i). (4.10) 

Combining (|4.9p with (j4.10p . we get that 

dist\U, U) = Amax(B(A + By'). 
For each 1 < j,k < n, there holds 



K{xk,Xj) - fj{xk) - fk{xj) + ifkjj) 



H 



K 



= K{xk,Xj) - fj{xk), 
which leads to B = K[X]'^ — A. Hence, we obtain 



1 



Amax(/^[Af]^A-i)- 

It follows from A = EE* that there holds (S^. □ 



Combining Lemmas 14.11 and 14.21 we obtain a bound for the distance between Sx and the 
optimal subspace St and give the last optimization problem for the searching of optimal sampling 
points. 

Theorem 4.3. IfEis nonsingular then 



£iSx)-£iST)<2KnJl 



1 



We conclude that the subspace approximation approach leads to the following problem 

mm Amax(i^[^]^(EE*)-i) (4.11) 

to be solved for the searching of optimal sampling points. We remark that when the measure 
fj, is discrete as in most practical applications, (|4.1ip is computationally favorable over ()3.3p . 
The reason is that in this case, an orthonormal basis for the optimal subspace St can be easily 
computed by the Karhunen-Loeve transform. At each stage of searching for the candidate 
sampling points X, the matrix E can be obtained efficiently and the major computation occurs 
with taking the inverse of a matrix. As comparison, algorithm ()3.3p additional requires the 
computation of the matrix K and its square root. 



5 Numerical Experiments 

In this section, we give some numerical experiments to illustrate the performance of algorithms 
(j3.3p and (|4.1ip for the searching of optimal sampling points. To this end, we first recall by 
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Lemma l2.ll that for an obtained n samphng points X = {xj ■ 1 < j < n} £ X"', the optimal 
method of reconstructing / of a given function / G T-Lk from the sampled data /(Af) is given by 

n 

fix) ='^o:jK{xj,x), x£X, (5.1) 
i=i 

where the coefficients ctj, 1 < i < n, are the unique solution of the linear system 

n 

'^K{xj,Xk)aj = f{xk), l<k<n. (5.2) 
i=i 

Here we assume throughout the section that the kernel matrix K[X] is nonsingular. 

Therefore, our procedure of experiments is as follows. We shall consider the Gaussian kernel 

K{x,y) = e-ll^-J'll', x,yeM.'^ 

and the sine kernel 

TT sin7r(x,- — y,) 

Let K be one of these two kernels, X = G M*^ be compact, and /i be a selected Borel measure 
on Q. We then solve the optimization problem (j3.3p or (j4.1ip to obtain n sampling points ^^opt' 
which are to be compared with the commonly used equally-spaced sampling points Afequ- For 
this purpose, we randomly generate 100 finite linear combinations / of the kernel 

as the target functions to be sampled, where both the coefficients Cj's and the locations Zj's will 
be randomly generated by the uniform distribution. For each of those target functions /, we 
then compute by ()5.ip and ()5.2p the reconstructed functions /gp^ and /equ from the sampled 
values of / on ^opt '^equ; respectively. Finally, the relative approximation errors 

„ _ ll/opt - f\\L2{n) _ ll/equ - fWi^in) 

"^opt '■- ijTjj— ' "^equ :- 



\L^{n) ll/llL2(n) 
are calculated. 

To present the results, we shall first plot ^opt against ^^equ- The mean and standard 
deviation of the difference Seqn — ^opt pairs of relative errors will then be tabulated. 

Finally, we plot the 100 pairs of relative errors for a visual comparison, followed by discussion. 

Experiment 1: algorithm ( 13.31) . K = the one-dimensional Gaussian kernel, 
= 12, r2 = [—3,3], /i = the Lebesgue measure on il. 

Figure 5.1 Distribution of the obtained 12 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on Q = [—3, 3]. 
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* opt 
equ 



0-*O *0 * *O *0 *0 Of 0* * 0* * 0*- 



-3 -2.8-2.6-2.4-2.2 -2 -1.8-1.6-1.4-1.2 -1 -0.8-0.6-0.4-0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 

Table 5.1 The mean and standard deviation of the improvernent iSequ — "-^opt • 



mean standrad deviation 

0.3705 X 10^3 0.5049 x 10"^ 



Figure 5.2 Relative approximation errors <fopt (mai'ked with a circle) and <?equ (marked 
with a star). 




10 20 30 40 50 60 70 80 90 100 



We observe that for the 100 pairs of relative approximation errors, there are only 20 pairs for 
which i^opt is larger than ^equ- Recall that the optimal sampling points are designed to ensure 
that it is best in average for all the functions in the RKHS T-Lk- Therefore, situations where the 
optimal sampling points perform worse than the equally-spaced sampling points could indeed 
occur. For this experiment, one sees that in those 20 instances, the relative errors <fopt ^'^d £'equ 
are comparable. More importantly, for all the instances where the relative error corresponding 
to the equally-spaced sampling points exceeds 1 x 10~^, the usage of the optimal sampling points 
can always bring down the relative error to below 1 x 10~^. We conclude that for this example 
the obtained optimal sampling points are superior to the equally-spaced points. 
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Experiment 2: algorithm ( 13. 3p . K = the one-dimensional Sine kernel, n = 8, 

= [—3,3], /i = the Lebesgue measure on fi. 

Figure 5.3 Distribution of the obtained 8 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on = [—3,3]. 



- * O G 



* opt 
O equ 



o * o * - 



-3 -2.8-2.6-2.4-2.2 -2 -1.8-1.6-1.4-1.2 -1 -0.8-0.6-0.4-0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 

Table 5.2 The mean and standard deviation of the improvement <Sequ ~ "^opt- 



mean standrad deviation 
0.0018 0.0026 



Figure 5.4 Relative approximation errors <fopt (mai^ked with a circle) and <Sequ (marked 
with a star). 



0.015 




0.005 



70 80 90 100 



For the 100 pairs of relative approximation errors, there are 23 pairs for which ifgpt larger 
than <?equ- There are 34 <?equ (compared to 10 <fopt) that are larger than 5 x 10~^. And in 26 
instances among those 34, replacing the equally-spaced points with the optimal sampling points 
reduces the relative approximation error to below 5 x 10"'^. We also conclude that for this 



17 



example the obtained optimal sampling points perform better than the equally-spaced points, 
although the improvement is not as drastic as Experiment 1. 

Experiment 3: algorithm (13.31) . K = the two-dimensional Gaussian kernel, 
n = 36, r2 = [—2,2] x [—2,2], fx = the Lebesgue measure on Cl. 

Figure 5.5 Distribution of the obtained 36 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on [—2, 2] x [—2, 2]. 
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Table 5.3 The mean and standard deviation of the improvement f equ — "^opt ■ 

mean standrad deviation 
0.0028 0.0043 



Figure 5.6 Relative approximation errors <fopt (marked with a circle) and <?equ (marked 
with a star). 
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For the 100 pairs of relative approximation errors, there are 23 pairs for which <fopt 
larger than fequ- In these pairs, Sequ and ^^opt rather close. We see that the value of the 
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optimal sampling points lies in that they could dramatically reduce the relative error when the 
equally-spaced points perform badly. There are 10 such examples in Figure 5.6. 

Experiment 4: algorithm (13. 3h . K = the two-dimensional Sine kernel, n = 25, 

= [—2,2] X [—2,2], /i = the Lebesgue measure on f2. 
Figure 5.7 Distribution of the obtained 25 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on = [—2,2] x [—2,2]. 
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Table 5.4 The mean and standard deviation of the improvement i?equ — "^opt- 



mean standrad deviation 
0.0020 0.0045 



Figure 5.8 Relative approximation errors "fgpt (marked with a circle) and <?equ (marked 
with a star). 
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In the 100 pairs of relative approximation errors, there are 31 pairs for which ifgpt larger 
than £eq\i- We see from Figure 5.7 that for this example, the obtained optimal sampling points 
are rather close to the equally-spaced points. As a consequence, the relative approximation 
errors shown in Figure 5.8 are comparable. 

In the following, we present two experiments about algorithm ()4.1ip . 

Experiment 5: algorithm ( 14. lift . K = the one-dimensional Gaussian kernel, 
n = 12, Q = [—3,3], /i is the uniform discrete measure supported at the 30 
equally-spaced points in Q. 

Figure 5.9 Distribution of the obtained 12 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on = [—3,3]. 
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Table 5.5 The mean and standard deviation of the improvement <Sequ ~ "^opt- 

mean standrad deviation 

0.7528 X 10^^ 0.9825 x 10^^ 

Figure 5.10 Relative approximation errors "fgpt (marked with a circle) and ^equ (marked with 
a star). 
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In the 100 pairs of relative approximation errors, there are only 16 pairs for which i?opt 
larger than i?equ- One sees that in those 16 instances, the relative errors Sq^i and i?equ a-re 
comparable. For the remaining 84 instances, the improvement brought by the optimal sampling 
points resulting from algorithm (|4.11|) is drastic. In particular, there are 39 instances where <?equ 
exceeds 10~^ while only three i^opt Comparing results here with those in Experiment 1, 

one sees that algorithm (|4.1ip is superior to (j3.3p for this problem. 



Experiment 6: algorithm (I4.11j) . K = the one-dimensional Sine kernel, n = 8, 
Q = [—3,3], /i is the uniform discrete measure supported at the 20 equally- 
spaced points in Q. 

Figure 5.11 Distribution of the obtained 8 optimal sampling points (marked with a star) 
and the equally-spaced points (marked with a circle) on = [—3,3]. 
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Table 5.6 The mean and standard deviation of the improvement <?equ ~ "^opt- 



mean standrad deviation 
0.0035 0.0063 



Figure 5.12 Relative approximation errors i^opt (marked with a circle) and ifequ (marked with 
a star). 
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In the 100 pairs of relative approximation errors, there are only 28 pairs for which <fopt 
larger than fequ- Except for 5 outliers, ^opt/^equ < 5 for those instances. For the remaining 
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72 improved instances, there are 21 for which "^equZ-^opt > ^ ^ which 'fequ/'^opt > 
We conclude that the optimal sampling points yielding from algorithm (|4.11|) are significantly 
better than the equally-spaced points. The results here outperform those in Experiment 2. 
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6 Appendix: proof of Example 12.41 

We shall prove that the optimal sampling points for Example 12.41 are given by ()2.10p and ()2.1ip . 
The proof is done by considering each case of the relative location of the two sampling point 
with respect to the reconstruction domain Vl = [a, h]. 

Case 1: xi,X2 lie on the right hand of $7. We set u = xi — h and r = X2 — xi. Then there holds 

g-2{M+L) _|_ g-2{«+r+L) _ 2g-2{«+r+L) 

m.\nV{x,xi,X2) = 

xgq 1 — e 

^ g-2(n+L)^ 

It is easy to see that 

sup mmV{x,xi,X2) = e~'^^ (6.1) 

and the supremum is achieves when u = 0. 

Case 2: xi,X2 lie on the left hand and the right hand of Q, respectively. We set t = a — xi and 
s = X2 — b. For each x £ il., we also let u = x — a. By these notations, we get that 

g-2{n+t) _^ g-2{L-«+s) _ 2g-2(L+s+t) 

miny(x,xi,X2) = min -r^- — -r . 

If t > L + s, we obtain that the minimum achieves at n = and 

e-2t + g-2(L+s) _ 2g-2(L+s+t) 

nnnV{x,x,,X2) = ^ _ g-2(L+.+t) ' 

which is decreasing with respect to s and t. Hence, we get the conclusion that 

2e-2i 

sup miny(x,xi,X2) = ^tt^, 

x^n 1 + e-2^ 



Xl,X2 
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where the supremum achieves at s = and t = L. Similarly, for the case when s > t + L, 
we also get that 

sup minl/(x,xi,X2) = zprr- 

xi,x2eRd^en 1 + e 

For the case when |s — t| < L, the minimum achieves at n = and there holds 

2g-(i+s+t) 

By taking the supremum of the above equation, we have 

m_iny(x, xi, X2) = - 

It follows from the inequality 



sup minyfx, xi, X2) — r- 
xen 1 + e ^ 



2e-^ 2e~2L 
> 



l + e~^ ' 1 + 
that in case (2), there holds 

-L 



2e" 

sup min y(x, xi, X2) = 37-, (6-2) 

xi.xaGM'' 1 + e ^ 



Case 3: xq, xi S 0. We set m = xq — a, v = 6 — xi and r = xi — xq- If x = a, we have that 

y (x, Xi, X2) = :t- = e 

1 — e 

Similarly, we also get for x = b that 

y(x,xi,X2) = e~^". 

If X S [xi,X2], there holds 

^-2\x-xo\ _|_ ^-2(r-\x-xo\) _ 2e~2^ 

y(x,Xi,X2) = ^ . 

1 — e 

Thus the minimum of V achieves at |x — xo| = § and there holds 

2e-^ 



min y(x,xi,X2) 



xe[xi,x2] 1 + e ^ 

According to the above discussion, we need to consider 

. / _2n -2v 2e-(^-"-^') 1 

sup mm < e , e , 77 7 > . 

u>v \ ' 1 + e-(^-«-^) J 

It is not difficult to see that 



24 



where there holds 

e-2« = ^ ^ . (6.3) 

By solving equation (|6.3p . we obtain 



2 

and 



= — - In 



. j _2n -2^, 2e-(^-"-^) I -e"^ + Ve-2i + 8e-^ ,^ ^, 



It remains to compare ()6.ip . ()6.2p and ()6.4p . By calculation, we have 



2 1 + e--^ 



which implies the optimal two sampling points should be placed inside $7 by (j2.10p and ()2.1ip . 
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